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Abstract —Strong secrecy capacity of compound wiretap chan¬ 
nels is studied. The known lower bounds for the secrecy 
capacity of compound finite-state memoryless channels under 
discrete alphabets are extended to arbitrary uncertainty sets 
and continuous alphabets under the strong secrecy criterion. 
The conditions under which these bounds are tight are given. 
Under the saddle-point condition, the compound secrecy capacity 
is shown to be equal to that of the worst-case channel. Based on 
this, the compound Gaussian MIMO wiretap channel is studied 
under the spectral norm constraint and without the degradedness 
assumption. First, it is assumed that only the eavesdropper 
channel is unknown, but is known to have a bounded spectral 
norm (maximum channel gain). The compound secrecy capacity 
is established in a closed form and the optimal signaling is 
identified: the compound capacity equals the worst-case channel 
capacity thus establishing the saddle-point property; the optimal 
signaling is Gaussian and on the eigenvectors of the legitimate 
channel and the worst-case eavesdropper is Isotropic. The eigen¬ 
mode power allocation somewhat resembles the standard water¬ 
filling but is not identical to it. More general uncertainty sets are 
considered and the existence of a maximum element is shown to 
be sufficient for a saddle-point to exist, so that signaling on the 
worst-case channel achieves the compound capacity of the whole 
class of channels. The case of rank-constrained eavesdropper 
is considered and the respective compound secrecy capacity is 
established. Subsequently, the case of additive uncertainty in the 
legitimate channel, in addition to the unknown eavesdropper 
channel, is studied. Its compound secrecy capacity and the 
optimal signaling are established in a closed-form as well, 
revealing the same saddle-point property. When a saddle-point 
exists under strong secrecy, strong and weak secrecy compound 
capacities are equal. 

Index Terms —Wiretap channel, compound channel, MIMO, 
strong secrecy, worst-case, saddle-point. 

1. Introduction 

The nature of the wireless medium makes wireless com¬ 
munication systems inherently vulnerable for eavesdropping. 
In this context, the concept of information theoretic security 
is instrumental since it solely uses the physical properties of 
the wireless channel in order to establish security. Information 
theoretic security was initiated by Shannon [1] and studied 
later by Wyner, who introduced the now-popular wiretap chan¬ 
nel [2] modeling the simplest scenario involving security with 
one legitimate transmitter-receiver pair and one wiretapper 
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(eavesdropper) to be kept secret. There is currently a growing 
interest in information theoretic security, see e.g. [3-6]. 

Since spatial multiple-input multiple-output (MIMO) tech¬ 
niques can improve the performance significantly [7], MIMO 
architectures have been identified as indispensable for future 
wireless systems. Accordingly, investigation of information 
theoretic security for MIMO systems is becoming more and 
more attractive. The secrecy capacity of the Gaussian MIMO 
wiretap channel is established in [8-11] under full channel 
state information (CSI), where it turns out that Gaussian 
signaling is optimal. Subsequently, the optimal transmit co- 
variance matrix has then been found under the matrix power 
constraint in [12] and under the total power constraint for a 
number of special cases [8,9,13,14]. 

Due to the dynamic nature of the wireless medium, but also 
due to implementation issues, practical systems always suffer 
from channel uncertainty and estimation/feedback inaccuracy. 
Thus, the provision of accurate channel state information to the 
transmitter is a major challenge for wireless communication 
systems. Along with this, it is hardly possible to expect that 
the eavesdropper will share its channel with the transmitter 
to make the eavesdropping harder, which makes the perfect 
eavesdropper CSI model more than questionable. A reasonable 
and well-accepted approach to this problem is to assume that 
the exact channel realization is not known; it is only known 
that it remains fixed during the entire transmission and that it 
belongs to a known set of channels (uncertainty set), which 
results in the concept of compound channels [15,16]. 

The discrete memoryless compound wiretap channel with 
a countably-finite uncertainty set (i.e. finite-state channels) is 
studied in [17,18]. Its secrecy capacity is established under 
the degradedness assumption, where all possible realizations 
of the eavesdropper channel must be degraded with respect to 
all possible realizations of the legitimate channel. When this 
condition is not satisfied, only an achievable secrecy rate is 
given while the secrecy capacity for the general case remains 
unknown. 

The corresponding compound Gaussian MIMO wiretap 
channel with countably-finite uncertainty sets is analyzed in 
[17]. Similarly to the discrete memoryless case, its secrecy 
capacity is established, again, only under the degradedness 
assumption. When the channel is not degraded, the secrecy 
capacity itself remains unknown and only an achievable se¬ 
crecy rate is obtained. In [19], the special case of compound 
wiretap channels with two possible channel states for the 
legitimate receiver and known eavesdropper channel is studied. 
Its secrecy capacity is established under the degradedness 
assumption and an achievable rate is given in the general (non- 
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degraded) case while the capacity is unknown. Interference 
alignment for the compound Gaussian MIMO wiretap channel 
is explored in [20]. A Gaussian MIMO wiretap channel where 
the noiseless eavesdropper channel is arbitrarily varying is 
considered in [21]. Its achievable secrecy rate (i.e. lower bound 
to the secrecy capacity) is given and the secrecy degrees of 
freedom are established, while its capacity remains unknown. 
Since degrees of freedom require SNR —> oo, it is not clear 
what finite-SNR implications are'. 

The discrete memoryless compound broadcast channel with 
confidential messages is studied in [23] and its strong secrecy 
capacity region is established in a multi-letter form. The 
corresponding Gaussian MIMO broadcast channel is consid¬ 
ered in [24] and its achievable degree-of-freedom region is 
established, but not the capacity region itself. 

In all the previous studies, the compound secrecy capacity 
has been established only for the special case of degraded 
channels with countably-finite uncertainty sets [17-19]. Ac¬ 
cordingly, it is not clear if these results hold for more general 
(e.g. uncountable) or arbitrary uncertainty sets as well or how 
these results extend to non-degraded channels. For such non- 
degraded channels, only an achievable secrecy rate is obtained 
with the consequence that it is not clear how far away this 
rate is from its actual capacity. The achievable secrecy rate is 
studied from a worst-case secrecy rate maximization point of 
view in [25]. Another approach is taken in [20,21] by studying 
the behavior for SNR —oo. However, this does not provide 
any insights on the secrecy capacity or its behavior for the 
practical relevant case of finite SNR < oo. 

In this paper, we address all these limitations and establish 
the (strong) secrecy capacity of compound Gaussian MIMO 
channels for a broad class of uncertainty sets (not only finite or 
countable) and without the degradedness assumption. We make 
use of the compound wiretap model, where the legitimate 
channel is perfectly known and the eavesdropper channel is 
not known to the transmitter but is known to have a bounded 
spectral norm (maximum channel gain), both being fixed 
during the whole transmission duration. This represents a 
quasi-static scenario where the eavesdropper cannot approach 
the transmitter closer than a certain protection distance so 
that its channel gain is bounded (due to the propagation 
path loss) but is unconstrained otherwise. This automatically 
implies only a minimal eavesdropper CSI at the transmitter, 
which reflects well the natural eavesdropper desire to be 
confidential and its lack of cooperation. Throughout the paper, 
full CSI at the eavesdropper is assumed (the safest assumption 
from the secrecy perspective). We make no assumptions of 
degradedness. The eavesdropper channel uncertainty scenario 
is further extended to the case where the legitimate channel is 
also allowed to have (additive) uncertainty, which represents 
channel estimation and feedback link limitations, and to the 
case of more general eavesdropper uncertainty sets, which may 
be non-isotropic. 

The compound secrecy capacity is established in two main 
steps. First, we consider the corresponding discrete memory- 

’Two systems having the same degrees of freedom may have vastly- 
different capacities, even at high SNR, see e.g. [22]. 


less (DMC) channel in Section II. For this channel model, 
an achievable (strong) secrecy rate was obtained in [18] for 
countably-finite uncertainty sets. Building on this result, we 
establish a lower bound for the compound (strong) secrecy 
capacity under arbitrary uncertainty sets (not necessarily finite 
or countable) in Theorem 2, which is subsequently extended 
to continuous alphabets in Theorem 3 using the set partition¬ 
ing (quantization) arguments adopted to compound channels 
in [26]. The conditions under which these bounds are tight are 
given, thus establishing the secrecy capacity. Under the saddle- 
point condition, the compound secrecy capacity is shown to 
be equal to that of the worst-case channel (so that any code 
designed for the worst-case channel also works on the entire 
class of channels in the uncertainty set). 

Secondly, the (strong) secrecy capacity of the compound 
Gaussian MIMO channel is established in Theorem 4 for 
the eavesdropper uncertainty with bounded spectral norm 
and without the degradedness assumption. This is done by 
establishing first an achievable rate of this channel in Corollary 
1?. Then, in Section V, the worst-case secrecy capacity (i.e. 
the capacity of the worst-case channel in the set) is obtained 
and the saddle-point property is established in the form 
max min = min max, where the maximization is over the 
transmit covariance and minimization is over the eavesdropper 
channel uncertainty. The saddle-point property has the well- 
known game-theoretic interpretation: the mini-max zero-sum 
game is between the transmitter (who controls the transmitted 
signal distribution) and the eavesdropper (who controls the 
channel); neither player can deviate from an optimal strategy 
without incurring penalty provided the other player follows it. 

Combining all these, we establish the secrecy capacity of 
the compound Gaussian MIMO channel in a closed-form, 
which also equals the worst-case capacity, so that a code 
designed for the worst-case channel works over the whole 
class of channels as well. The optimal signaling is Gaus¬ 
sian and on the eigenvectors of the legitimate channel, with 
power allocation somewhat similar but not identical to the 
regular water-filling. The worst-case eavesdropper is isotropic 
with the maximum allowed channel gain. This result is then 
extended to a broader class of compound channels, where 
the uncertainty set is only required to have a dominant 
(maximum/maximal) element and may be non-isotropic. It 
is shown that the existence of a maximum element in the 
eavesdropper uncertainty set is sufficient for a saddle-point 
to exist, so that the compound capacity equals the worst- 
case one and signaling on the worst-case channel achieves the 
capacity of the whole class of channels. The high/low SNR 
regimes are considered and the condition for beamforming 
optimality is given. When the eavesdropper uncertainty is 
sufficiently large, beamforming is optimal at any SNR. The 
case of rank-constrained eavesdropper is considered, motivated 
by the scenario where the transmitter is a base station with a 
large number of antennas while the receiver/eavesdropper are 
handsets with a small number of antennas. Under this non- 
convex constraint (in addition to the convex spectral norm 

^Unlike [17], this is done for arbitrary (compact) uncertainty sets, not just 
countabie or finite, and under the strong secrecy constraint. 
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constraint), there is no maximum element in the uncertainty 
set, yet the saddle-point property is shown to hold and the 
compound secrecy capacity is established. Subsequently, a 
more general case of two-sided channel uncertainty is studied 
in Section VI, where the legitimate channel is also allowed to 
have (additive) uncertainty. This reflects the assumption that 
the legitimate receiver will share its CSI with the transmitter, 
but limitations in feedback link and channel estimation result 
in channel uncertainty. The corresponding compound secrecy 
capacity is established and shown to be equal to the secrecy 
capacity of the worst-case channel in the uncertainty set, so 
that the saddle-point property still holds. The optimal signaling 
is still on the eigenmodes of the legitimate channel and the 
worst-case eavesdropper is isotropic. 

While it was established in [27,28] that the strong and weak 
secrecy capacities are the same for regular (non-compound 
or known) channels. Section VII demonstrates that the same 
holds for compound channels if the saddle-point property 
holds under strong secrecy. 

Finally, Section VIII concludes the paper. 

Notations: Discrete random variables are denoted by capital 
letters and their realizations and ranges by lower case and 
script letters, respectively; scalars, vectors, and matrices are 
denoted by lower case letters, bold lower case letters, and 
bold capital letters; N, M+, and C are the sets of positive 
integers, non-negative real, and complex numbers respectively; 
1^1 and denote the cardinality and the complement of the 
set /(•;•) is the mutual information and 7 ^ 2 (•) is the binary 
entropy function; 'P{ ) is the set of all probability distributions 
and E{-} is statistical expectation; —)■ V —> Z denotes a 
Markov chain of random variables X, Y, and Z in this order; 
A^, A+, and |A| are the transposition, Hermitian conjugation, 
and determinant of A; tr A is the trace of the matrix A and 
diag(a) is a diagonal matrix with elements given by a; A > B 
means the matrix A — B is positive semi-definite; I is the 
identity matrix. 

II. Discrete Memoryless Channels 

In this section we consider discrete memoryless channels 
(DMCs) with finite input and output alphabets. Building on 
earlier results in [18] for finite-state channels, an achievable 
secrecy rate is established for the general case of arbitrary 
uncertainty sets (not limited to finite or countable), which is 
subsequently extended to continuous alphabets in Section III. 

A. Compound Wiretap Channel 

Let X and y, Z be countably-finite input and output sets 
and 5 be a set which will model the channel uncertainty. 
The channels to the legitimate receiver and the eavesdropper 
(wiretapper) are given by Wg : A x 5 —> V{y) and Vg : X x 
S — 7 - V{Z), respectively, where s G 5 is a channel state. For 
a fixed state s G S, input and output sequences cc” G A” and 
yti g yn^ ya g 2 " of block length n, the discrete memoryless 
channels are given by lA”(j/”|a;”) = ]Xi^iWg{yi\xi) and 
17"(z"|x") = n”=i ^s{zi\xi). The channels are assume to be 
quay-static: s is selected at the beginning and is held constant 
during the entire transmission. 


Definition 1. The discrete memoryless compound wiretap 
channel 211 is given by 

2n={(VL„14):sG5}. 

Remark 1. This includes the widely adopted model of the 
form 211 = {(lAsi, Vsj) : si G 5i, S 2 G 52 } with 5i 7 ^ S 2 as 
one can always construct a new set of the form S = Si x ^ 2 . 

Definition 2. An (n, Mn)-code Cn for the compound wiretap 
channel consists of a stochastic encoder at the transmitter 

E:Mn^r{X^), (1) 

i.e., a stochastic matrix, with a set of messages Al„ = 
Mn} and a decoder at the legitimate receiver described 
by a collection of disjoint decoding sets 

{V^cy^ :mGMn}, (2) 

so that m = m if t/" G Dm, where m is the decoded message 
at the receiver. 

The encoder in (1) is allowed to be stochastic (this in 
fact is essential for achieving secrecy) which means that 
it is specified by conditional probabilities E{x'^\m) with 
E{x"‘\m) = 1 for each m G Mn- Then, E{x'^\m) 
denotes the probability that the message m G Mn is. encoded 
as cc” G A”. 

Then for an (n, M„)-code C„, the maximum probability of 
decoding error at the legitimate receiver is given by 

e„ = sup max W^{Dm\x'^)E{x^\m). (3) 

gcs ra&Mn 

Remark 2. Throughout the whole paper we assume that the 
transmitter and legitimate receiver do not have full CSI, i.e., 
they do not know the actual realization s G S but do know the 
uncertainty set S. Accordingly, encoder (1) and decoder (2) 
are universal and do not depend on the particular realization. 
On the other hand, we make a conservative (and safest from 
secrecy perspective) assumption that the eavesdropper has 
perfect CSI of both channels (to the legitimate receiver and 
its own). 

To keep the transmitted message secret from the eaves¬ 
dropper for all channel realizations s G 5, we require the 
information leaked to the eavesdropper to be arbitrarily small, 
i.e. 

sup/(M;2'") < e„ (4) 

s^S 

for some e„ > 0 and e„ —> 0 as n —)■ 00 , where M is the 
random variable uniformly distributed over the set of messages 
Mn and Zf = [Zg^i, Zg^ 2 ,Zg^n] is the eavesdropper 
channel output for channel realization s G S. This criterion is 
known as strong secrecy [27,28]. 

Remark 3. The vanishing information leakage to the eaves¬ 
dropper implies that its bit error probability Pi, approaches 1/2 
as n —> 00 (and thus codeword error probability approaches 
1 ) and the speed of convergence depends on the secrecy 
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criterion adopted. In particular, it can be shown (using Fano’s 
inequality) that 


Pb = - — o(l) under weak secrecy, 


1 


1 


Ph = - — o under strong secrecy, 


(5) 


(6) 


so that Pfc —> 1/2 in any case, but the speed of convergence 
can be arbitrarily slow under weak secrecy, while it is at least 
as l/\/n under strong secrecy. Using the recent result in [18] 
on exponential convergence of information leakage to zero for 
the discrete memoryless channel (see (10)), it can be further 
shown that 


Pb=\-0{e-n 


(7) 


i.e. exponentially fast in that scenario, where a > 0. This 
provides an operational meaning for secrecy criteria. 


Definition 3. A non-negative number Rg is an achievable 
secrecy rate if for all (5 > 0, there is an n{5) G N and 
a sequence of (n,M„)-codes with maximum error 

probability e„ such that for all n > n{6), 

- log Mn> Rs- S, 
n 

sup/(M;Z”) < e„, 
s£S 

and e„,e„ —>■ 0 as n —>■ oo. The compound secrecy capacity 
Cc of the wiretap channel 21J is given by the supremum of all 
achievable secrecy rates Rg. 


B. Countably-Finite Uncertainty Sets 

The discrete memoryless wiretap channel with a countably- 
finite uncertainty set (i.e. finite-state channels) is studied in 
[17,18]. In particular, the following achievable secrecy rate 
was established in [18, Theorem 2]. 

Theorem 1 ([18]). The compound secrecy capacity Cc of the 
discrete memoryless wiretap channel 2IT is lower-bounded as 
follows: 

Cc > max ( mmI{X;Yg) — uiaxI(X; Zg)) (8) 

where the uncertainty set S is countably-finite. Here, the 
random variables Yg and Zg denote the outputs of the cor¬ 
responding channels Wg and Vg for s € S. 

Furthermore, it has been shown in [18, Theorem 2] that the 
secrecy rate given in (8) is achieved with maximum probability 


of error of the form 


e„ < |5|1/42-"“ 

(9) 

and the secrecy constraint behaving as 


max/(M;Z”) < 2""^ 
s^S 

(10) 


for some a,f3>0. Thus, both criteria, i.e., reliability (3) and 
secrecy (4), decrease exponentially fast for increasing block 
length n. In addition, both bounds do not depend on the 
particular channel realization. These two properties will be 
indispensable for extending this result from countably-finite 
to arbitrary uncertainty sets. 


C. Arbitrary Uncertainty Sets 

The result above applies to finite-state channels (i.e., 
countably-finite uncertainty sets) and discrete alphabets. Here, 
we extend it to arbitrary uncertainty sets, which are not 
required to be finite or countable. Subsequently, it will be 
extended to continuous alphabets and compact uncertainty sets 
in Section III. 

To accomplish this, we adapt the ideas from Blackwell, 
Breiman, and Thomasian [15] and approximate arbitrary com¬ 
pound wiretap channels by suitably chosen finite-state chan¬ 
nels. 

Lemma 1. Let 2II = {(Wg,Vg) : s G S} be a discrete 
memoryless wiretap channel with arbitrary uncertainty set 
S. For every integer L > there is a compound 

wiretap channel Wl = {iWg,Vg) : s G 5l} with a 
countably-finite uncertainty set Sl, |5l| < {L -\- 1)I'^IITI|2/ 
such that any {Wg, Vg) G W is closely approximated by some 
{Wg,Vg) G Wl so that 

(a) for all X G X, y G y, z G Z, 

\Wg{y\x) - Wg{y\x)\ < \y\\Z\/L, (11a) 

\Vg{z\x)-Vg{z\x)\ < \y\\Z\lL, (lib) 

Wg{y\x) < 2 ' Wg{y\x), (11c) 

Vg{z\x) < 2 ' Vg{z\x). (lid) 

(b) For any input distribution Px G V{X), 

\I{X-,Yg) - I{X;Yg)\ <2{\y\\Z\)^^yL^/^, (12a) 

\I{X;Zg) - I{X;Zg)\<2{\y\\Z\)^^yL^/^. (12b) 

Proof: The proof can be found in Appendix A. ■ 

This shows that we can approximate an arbitrary compound 
wiretap channel 211 by a finite-state one 211^ so that any 
channel in 211 is close in several senses to one of the new 
constructed channels in 2I1 l (they can be made arbitrary close 
by increasing the number of quantization levels L, which we 
exploit below). The next lemma shows that if there is a “good” 
code for a wiretap channel, then the same code can be used 
for all wiretap channels in its neighborhood. 

Lemma 2. Let {Wg,Vg) and {Wg,Vg) be two wiretap chan¬ 
nels and L > 0 such that Lemma 1 holds. Then any (n, M„)- 
code for (Wg, U*) is also an (n, Mn)-code for (Wg, Vg) with 

e„ < 22"ITF|2|V^e-„ (13) 

and 

\I{M-Z':)-I{MXg)\ 

< 4n\y\\Z\^ log \Z\/L + 4nH2{\y\\Z\yL). (14) 

with 7^2 (•) the binary entropy function. Here, e„ and e„ denote 
the maximum probabilities of error for the channels Wg and 
Wg respectively, cf. (3). 

Proof: The proof can be found in Appendix B. ■ 

Remark 4. The tight bound in (14) is established based on a 
recent result on the continuity of the secrecy capacity of com¬ 
pound wiretap channels [29], which in turn was established 
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using a technique developed for quantum channels in [30]. 
Following instead the classical approach of [15] by applying 
(12b) leads to a loose bound 

J(M;Z™)| <2(|J^||Z|)3"/2/2,i/2 ( 15) 

which increases exponentially fast in the block length n. This 
then prohibits the proof of Theorem 2 (using the number 
L of quantization levels that scales up exponentially in n 
would not help since the bound in (21) would diverge). Thus, 
using the tight bound in (14) (which increases only linearly 
in n) is essential. This bound also reveals the scaling of the 
number of quantization levels L with n required to make 
the approximation error arbitrary small. If only weak secrecy 
is of interest, then the normalized difference is bounded by 
a constant independent of n, which can be made as small 
as desired by using sufficiently large but fixed L, while 
strong secrecy requires L to scale faster than nlogn for the 
approximation error to become arbitrary small. 

The two lemmas above allow one to extend the finite-state 
result in Theorem 1 to arbitrary uncertainty sets. To proceed 
further, we need the following definitions that establish an 
ordering of compound wiretap channels. 

Definition 4. A compound DMC Vs^ is said to be noisier than 
a compound DMC if 

IiU;Ysfi>I{U;Z,^) (16) 

for any aggregate channel state s = (si, S 2 ) € 5, any random 
variable U and any DMC (7 —> X such that (7 —> X —> 
(17j, Zsfi) is a Markov chain. 


Proof: The proof of the lower bound is based on Lemmas 
1 and 2. We approximate the arbitrary compound wiretap 
channel 211 by a finite-state one 211^ with the number of 
quantization levels L = L{n), which is selected in such a 
way that: 

1) it satisfies the condition of Lemma 1, 

2) the secrecy rate supported by the approximated channel 
approaches that of the original channel arbitrary closely 
(so that L{n) —> 00 as n —)■ 00 ), 

3) maximum error probability approaches 0 as n —> 00 
(so that L{n) > 2\y^\Zf ja but L(ri) has to increase 
slower than exponentially), 

4) secrecy criterion approaches 0 as n —)■ 00 (so that L(n) 
has to increase faster than nlogn). 

Note that criterion 4) dictates the fastest increase of L{n) and 
using the classical approach of [15] would not satisfy it. The 
following analysis shows that L = a • n^ is a proper choice 
for the number of quantization levels, where 

a > 2|J^nZ|2max{l,l/a}, (18) 

and a is as in (9). 

For each (144,14) € 213, we select a sufficiently good 
approximation {Ws,Vs) according to Lemma 1. The corre¬ 
sponding finite-state compound channel is denoted by 213^ 
and the countably-finite uncertainty set by 5^, where |5i| < 
(L+1)I^IITI|2|. 

Next, we check the reliability part. Fix input distribution 
Px and set the secrecy rate 


Definition 5. Compound DMC is said to be (physically) 
degraded with respect to compound DMC 114 ^ if X —> 14^ —> 
Zg^ is a Markov chain for any channel state s = (si, S 2 ) € S 
and any input X. 

These definitions are an extension of the corresponding 
definition for non-compound (single-state) channels, see e.g. 
[6,31]. Similarly to the single-state channels, it can be shown 
that “degraded” implies “noisier”, but the converse is not true, 
i.e. the latter requirement is weaker than the former (so that 
there are channels that are “noisier” but not “degraded”). An 
equivalent to the less noisy requirement, which is somewhat 
easier to verify, can be established in the same way as for the 
single-state channels (see [32]). 

Proposition 1. The compound DMC Vg^ is noisier than the 
compound DMC 114^ if and only if I{X;Ygfi — I{X',Zgf) 
is concave in the input distribution Px for any channel state 
S = (si,S2) G S. 

The compound secrecy capacity of the discrete memoryless 
wiretap channel 223 can now be characterized as follows. 


Rg=mmI{X-,Yg)-maxI{X-,Zg)-e (19) 

sGS sGS 

for some e > 0. From Theorem 1, there exists an {n, Mn)- 
code for 223^ with probability of error 

e„ < |5Lr/^2-”“ 

<{L+ i)(l'Y|IT||2|)/42-"a ^ 0 as n ^ 00 , (20) 


where the steps follow from (9), |5i| < {L + 1 )I‘^IITI| 2 |^ 
Lemma 1, and L = a ■ nf. Furthermore, from Lemma 1, for 
each 114 G 223 there is an appropriate Wg G 223^ such that 
Wg{y\x) < 2^l^l 1^1 /^Wg{y\x) for all x,y. Thus, Lemma 
2 implies that the code for 223^ is also a code for 223 with 
probability of error 


< l-^{\X\\y\\Z\)/42-n{a-^XY^) ^ 


( 21 ) 


Theorem 2. The compound secrecy capacity Cc of the discrete 
memoryless wiretap channel 223 is bounded as follows: 

C'c>sup( inf I{X;Ygfi- sup I{X;Zg^)) (17) 

Px S2eS2 

for any uncertainty set S (not necessarily finite or countable), 
and the equality is attained ifVg^ is noisier than 114j. 


Since L = arf, we have e„ —>■ 0 as n —?• 00 . This means 
the code constructed for the approximated channel 223^ also 
satisfies the reliability criteria for the original channel 223. 
Thus, it remains to show that the rate of this code is arbitrarily 
close to the desired rate and the strong secrecy condition 
is satisfied. From Lemma 1(b), one obtains, for any input 
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distribution Px, 

miI{X-Ys)-sn^I{X-Zs) 

— ( min/(X; — max/(X; Zs)) 

<4{\y\\Z\f/^/L^/^. (22) 

Thus, the difference between the rate achieved by the code for 
the approximated finite-state channel 21 Jl and the desired rate 
for the original channel 2II is arbitrarily small, since L —> oo 
as n —)■ oo. 

It remains to check that the secrecy constraint is also 
satisfied. The code above for the approximated finite-state 
channel Wl has maxs^SL Z^) ^ 2“"^, cf. Theorem 
1 and (10), so that evoking Lemma 2, one obtains 

sup/(M;Z") < max/(M;Z”) -f 4n|J^||Z|2log|Z|/L 
sGS 

+ 4nH2m\Z\VL) 

<2-^^ +4\y\\Z\Hog\Z\/{an) 

+ 4nH2{\y\\Z\y{an^)) 

—> 0 as n —)■ oo (23) 

where we have used nH 2 {a/n'^) —)■ 0 as n —)■ oo for any 
a > 0. Thus, also the information leaked to the eavesdropper 
is arbitrarily small. 

To establish the equality part under the noisier condition, 
observe that, by extending the proof of the converse in 
Theorem 3 of [31] to the compound setting and requiring 
the encoder to be independent of the actual channel states, 
it can be shown that any achievable secrecy rate is bounded 
as follows 

Rs<I{X-Ys,)-I{X-Zs,) (24) 

for any channel state (si,S 2 ), where the input X is induced 

by the encoder, so that 

Rs < inf /(X; n,) - /(X; Z,,) (25) 

S 

from which it follows that 

C'e<supinf/(X;nj-/(X;Z,J (26) 

Px « 

and thus establishes the equality. ■ 

Remark 5. The proof of Theorem 2 reveals that the required 
scaling of L{n) depends on the secrecy criterion adopted, cf. 
in particular (23): this requires L{n) to scale faster than n log n 
and motivates the convenient choice of L{n) = a-v? for some 
a satisfying (18) (in fact, using with any 5 > 0 would 
work as well). Requiring weak secrecy instead allows for the 
quantization number L{n) to increase arbitrarily slowly in n 
(e.g. as logn or log log n). 

Remark 6. It should be emphasized that two properties of the 
probability of error (9) and the secrecy (10) are indispensable 
to extend the result from finite uncertainty sets to arbitrary 
uncertainty sets: its exponentially-fast decreasing behavior and 
its independence of the actual channel realization. Thus, such 
bounds have to be established carefully for the finite case since 


otherwise an extension to the arbitrary case is not possible. 
Moreover, the approximation must be done carefully enough 
(e.g. as in (14) with L{n) = a ■ n?) to ensure that both 
the secrecy and reliability criteria are still valid after the 
approximation. 

Remark 7. Since each degraded channel is also “noisier”, the 
equality in Theorem 2 also holds for degraded channels. 

To proceed further, we need the following definitions. 

Definition 6 . Compound DMC is said to be less capable 
than compound DMC Wg^ if for every Px and any channel 
state (si, S 2 ) € S 

I(X;Ygfi>I(X;ZgJ. (27) 

This definition extends the corresponding definition in [6] 
to the compound channel setting. Following the same line of 
analysis as for single-state channels, it can be shown that the 
less capable requirement is strictly weaker than the noisier 
one (i.e. each “noisier” channel is also “less capable” but 
the converse is not true), and hence strictly weaker than the 
degraded one. 

Definition 7. A compound wiretap channel is said to have a 
saddle-point if 

sup inf (/(X;nj-J(X;Z,J) 

Py 

/ X (28) 

= infsup /(X;nj-/(X;Z,J 

where s = (si,S 2 ) is the aggregate channel state. 

Note that this definition does not impose any operational 
meaning on the quantities involved. The following corollary 
provides such operational meaning. 

Corollary 1. If the compound wiretap channel 211 has a 
saddle-point and satisfies the less capable condition, then the 
compound secrecy capacity Cc is the same as the worst-case 
channel capacity C^, 

(70 = sup ( inf I{X;Ygfi- sup I{X;Zg^)) 

= inf sup (/(X; nj - /(X; Z,J) = C„. 

In particular, the channel has a saddle-point if 

1) 81,82 are compact and convex, and 

2) I(X;Yg,^) — I{X]Zs 2 ) is lower semi-continuous and 
quasi-convex in s, and upper semi-continuous and quasi¬ 
concave in Px- 

Proof: Since the legitimate and eavesdropper channel 
states are independent of each other, it follows that 

inf I{X;Yg,)- sup /(X;Z,J 

sie5i s2eS2 

= inf (/(X;y,J-/(X;Z,J) (30) 

sGo 
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SO that the following chain inequality holds 
C„= inf sup 

= sup inf 
Px 

< (31) 

where first equality holds since, from [6, Corollary 3.5], 

sup(/(X;i;j-/(X;Z,J) (32) 

Px 

is the secrecy capacity under channel state (si,S 2 ) and the 
less capable condition, so that taking infg gives the worst- 
case capacity; the first inequality is due to Theorem 2 and the 
last inequality is due to the fact that compound capacity cannot 
exceed the worst-case one (since the compound code has also 
to work on the worst-case channel). This proves Cc = Cw The 
last statement follows from von Newmann mini-max theorem 
and its subsequent generalizations, see e.g. [33, Theorem 9.D]. 

■ 

Remark 8. The importance of this result is due to the fact that 
a code designed for the worst-case channel also works on the 
whole class of channels , i.e. is robust (which is not true in 
general). 

Remark 9. The requirement of semi-continuity can be dropped 
in the case of countably-finite alphabets (since the mutual 
information is known to be continuous in such settings), but 
it is essential for countably-infinite or continuous alphabets. 

Remark 10. Since “noisier” implies “less capable”. Corollary 
1 also holds for “noisier” and degraded (physically or stochas¬ 
tically) channels. 

Thus, Theorem 2 extends Theorem 1 to arbitrary uncertainty 
sets. The next step is to extend this result to continuous 
alphabets. 

III. Continuous Alphabets 

To establish an achievable secrecy rate for the compound 
Gaussian MIMO wiretap channel, we have to deal with 
continuous input and output alphabets as well as probability 
density functions. Therefore, we extend the previous result in 
Theorem 2 to continuous alphabets. 

Let us consider the general case of input and output al¬ 
phabets X, y, and Z which are standard [34]. Such alphabets 
include practically relevant cases such as continuous alphabets 
in Euclidean spaces or finite alphabets (see [34] and [35] for 
an extensive discussion of this; the requirement of random 
variables to be defined over a standard space ensures that con¬ 
ditional probability measures are well-defined). Accordingly, 
we assume that the corresponding random variables can be 
described by probability density functions and that all mu¬ 
tual information terms are calculated according to continuous 
alphabets and are finite. 

Usually, results are extended from discrete memoryless 
channels to continuous channels by using the discretization 
procedure or partitioning method as outlined for example 


in [36]; see [35] or [37] respectively for a more detailed 
treatment. Such an approach invokes quantization arguments, 
where for any input distribution px for the continuous channel, 
the input and output alphabets are partitioned making the re¬ 
sults for finite alphabets applicable. Letting the corresponding 
quantizer be sufficiently fine, the actual mutual information 
terms of the partitioned alphabets can be made arbitrarily close 
to the continuous one. 

Applying this approach to compound channels has to be 
done carefully. We have to ensure that the sequence of 
successively finer quantizers partitions the input and output 
alphabets in such a way that the mutual information terms 
between the quantized alphabets approaches the desired terms 
for continuous alphabets for all possible channel realizations 
simultaneously. Thus, the invoked quantizers must not depend 
on a particular channel realization. This issue is discussed in 
detail in [26] which studies the compound channel with side 
information. The following result is a slight extension of the 
corresponding result in [26] to the wiretap channel setting. 

Lemma 3. For the compound wiretap channel 211 with stan¬ 
dard input and output alphabets, there exists a sequence 
of successively finer quantizers {gx.fc, 7z,fc}fceN for the 
input and outputs such that for any channel realization s € S 

I{X-Ys)= lim l{qx,k{X)-qY,k{Ys)) 

fc—>oo 

I{X;Zs)= lim l{qxAXy,qzAZs)). 

fc—>oo 

This means there exist universal sequences of quantizers which 
work for all channel realizations s € S simultaneously if the 
input and output alphabets are standard. 

Proof: See [26, Lemma 3] and also [35] for further 
details. ■ 

The second technicality is that one has to ensure that such 
sequences of functions converge uniformly on a compact 
set when they converge pointwise (this is needed since the 
transmitter does not know the channel state). 

Lemma 4. Let Wg, Vg be continuously parametrized by s € S, 
where S is a compact set. Then, for all channel realiza¬ 
tions s G S and for each input distribution px, there 
exists a sequence of successively finer universal quantizers 
{9x,k,qY,k,qz,k\k&i such that for each e > 0, there is an 
n(e) G N such that for every k > n(e), 

l{qx,k{.X)] qY,k{Yg)) — l{qx,k{X)-,qz,k{Zg)) 

> inf/(2f;n)-sup/(X;Z,)-e. 

g(zs 

Proof: The proof follows by applying [26, Lem¬ 
mas 4 and 5] to both terms I{qx,k{X)]qY,kiYg)) and 
I{qx,k{X)]qz,k{Zg)). ■ 

Having these technicalities in mind, we are now in the po¬ 
sition to establish the desired result for continuous alphabets. 

Theorem 3. The compound secrecy capacity Cc of the wiretap 
channel 211 continuous in s and with standard (possibly 
continuous) input and output alphabets is bounded as follows: 

Cc > sup ( inf IiX-,Yg,) - sup I{X-Zg,)) (33) 

Px S2&S2 
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for any compact uncertainty set S, and the equality is attained 
if Vs 2 is noisier than Wg^. 

Proof: To prove the lower bound, we follow the dis¬ 
cretization procedure and use sequences of successively finer 
quantizers {qx,k,qY,k,qz,k}k&n according to Lemmas 3 and 
4 which partition the continuous input and output alphabets 
in such a way that we end up with mutually disjoint events 
which cover the entire spaces. Then, all mutual information 
terms are calculated according to these partitions. 

For each choice of quantizers qx,k,qY,k,qz,k, the whole 
encoding and decoding procedure as used in the proofs of 
Theorems 1 and 2, cf. also [18], is done according to this 
partition. Then, the analysis of probability of error and the 
analysis of the secrecy criterion for finite alphabets ensures 
that any rate Rg satisfying 

Rg < sup ( inf l(qx,k{X);qY,k{Yg)) 

Px *^-5 

- s\xy>l{qx,k{X)-,qz,k{Zg)) 
s^S 

is achievable with strong secrecy for the compound wiretap 
channel. 

From Lemmas 3 and 4, for standard alphabets X, y, Z, 
and any e > 0, one can find sequences of successively finer 
quantizers {qx,k, qY,k, (Zz,fe}/ceN such that for sufficiently large 
n and any k > n, 

ini l(qx,k{X);qY,k(Ys)) - sup l(qx,kiX); qz,k{Zg)) 

> ini I{X;Yg)-snp IiX-,Zg)-e 
se-S ges 

(34) 

so that any rate 

Rg < sup(inf I{X]Yg) — sup/(2f; Z^)) — e 

Px 

is achievable for standard (continuous) alphabets as well, from 
which (33) follows. Note that as the uncertainty set is assumed 
to be compact and therewith bounded, all terms are well 
defined and finite for standard alphabets as well. The equality 
part is established as in Theorem 2 (using the upper bounds in 
(24)-(26), which apply to continuous alphabets as well). This 
completes the proof. ■ 

Using this theorem. Corollary 1 can be extended to contin¬ 
uous alphabets in a natural way. 

IV. Gaussian MIMO Channels 

We are now in the position to specialize the result in 
Theorem 3 to Gaussian MIMO channels. To this end, let Nt 
be the number of transmit antennas at the transmitter and Np^) 
be the numbers of receive antennas at the legitimate receiver 
(eavesdropper). The input-output relations for the Gaussian 
MIMO wiretap channel are given by 

yi = Hix-f|i, y2 = H2X + |2 (35) 

where x = [xi,X2t--,xntY ^ transmitted 

signal, yi( 2 ) € jj^g legitimate receiver 

(eavesdropper), ^i( 2 ) € jj^g cij-cuiarly-symmetric 


additive white Gaussian noise at the receiver (eavesdropper) 
(normalized to unit variance in each dimension), and Hi( 2 ) G 
(C^i{ 2 )XNt jg j-jjg niatrix of the complex channel gains between 
each transmit and each receive (eavesdropper) antenna. The 
channels Hi( 2 ) are assumed to be fixed (constant) during the 
whole transmission of block length n. We assume an average 
transmit power constraint trR < Pt where Pt is the total 
transmit power and R = E{xx+} is the transmit covariance 
matrix. 

For this channel, the secrecy capacity subject to the total 
average transmit power constraint is [8-11] 


Cg = max In 

R, 


|I + WiR| 

II + W2RI 


(36) 


where = H/'Hi, i = 1,2, and max is subject to the 
constraints R > 0 and tr R < Fr¬ 
it is well-known that the problem in (36) is not convex 
in general and explicit solutions for the optimal transmit 
covariance are not known for the general case, but only for 
some special cases (e.g. low-SNR, MISO channels, or for the 
full-rank case) [8-11,13]. 

Let us now consider a compound Gaussian MIMO wiretap 
channel where the exact channel realizations Hi and H 2 are 
unknown. It is only known to the legitimate user that they 
belong to the compact set S. Again, we make the safest 
assumption from the secrecy perspective and assume that 
the eavesdropper knows both Hi and H 2 , cf. also Remark 
2. Then, evaluating Theorem 3 for this particular choice 
of compound Gaussian MIMO channel yields the following 
achievable secrecy rate. 


Corollary 2. The (strong) compound secrecy capacity Cc of 
the compound Gaussian MIMO channel in (35) is lower- 
bounded as follows: 


Cc > max min In 
R Wi,W2 


|I-f WiR| 
|I-f W2RI 


where max and min are subject to R, Wi, W 2 >0, trR < 
Pj-, and Wi, W 2 belong to a compact set S. 


A similar result was given earlier in [17, Lemma 1] 
under the weak secrecy constraint and finite-state channels 
(countably-finite uncertainty sets). Corollary 2 extends it to 
strong secrecy and arbitrary (compact) uncertainty sets. 


V. Eavesdropper Channel Uncertainty 


Let us now consider a particular compound channel where 
Hi is given (known to the transmitter) and H 2 can be any 
(unknown) subject to the spectral norm constraint 


S 2 ={H 2 : IH 2 I 2 = max |H 2 x| < yC} 

|x| = l 

= {W2:|W2|2 = Ai(W2)<e} 


(37) 


where |x| = Vx+x is the Euclidean norm of x, |H |2 = cri(H) 
is the spectral norm of H, i.e. its largest singular value (Ti(H); 
Ai(W 2 ) is the largest eigenvalue of W 2 . Thus, the set ^2 
includes all W 2 that are less than or equal to el. 

Note that |Hx| represents the channel (voltage) gain in 
transmit direction x so that |H |2 is the largest channel 
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gain. |W |2 represents the largest channel power gain. The 
importance of the spectral norm in the context of regular 
MIMO channels has been discussed in [38]. Essentially the 
same motivation applies to the secure MIMO channel here. 
In particular, the set in (37) limits the maximum gain of the 
eavesdropper channel without putting any constraint on its 
eigenvectors. This represents the physical scenario where the 
eavesdropper cannot approach the transmitter beyond a certain 
minimum (protection) distance (so that the channel gain is 
bounded due to propagation path loss) being unconstrained 
otherwise. 

To establish the secrecy capacity of this compound channel 
(not necessarily degraded) in Theorem 4, we establish hrst a 
number of intermediate results in Propositions 2 and 3. 



A. Worst-Case Secrecy Capacity and Saddle-Point Property 

The following proposition gives the capacity of the worst- 
case channel in this set. For this purpose we dehne 


C'(R,W 2 ) =ln 


|I 

|I 


WiR| 

W2RI 


(38) 


which depends on the transmit covariance matrix R and the 
eavesdropper channel W 2 = HJH 2 unknown to the trans¬ 
mitter. The channel to the legitimate receiver Wi = H^^Hi 
is fixed and known to the transmitter. 


Proposition 2. Consider the class of channels in (35) for a 
given (known) Wi and any W 2 & S 2 (as in (37)). Then, the 
secrecy capacity C^, of a worst-case channel is 

= minmaxC'(R, W 2 ) = C*(e) (39) 

W2 R 

where max and min are over all admissible R, W 2 .' R, W 2 > 
0, trR < Pt, W 2 € S 2 , i-c. IW 2 I 2 < e, and 

C'*(e)= max ^(R, el) (40) 

trR<PT 

is the secure capacity for the isotropic eavesdropper 'W 2 W = 
el, which is the worst-case eavesdropper in S 2 - 

Proof: Observe that |I-|-WR| is monotonically increasing 
in W, i.e. 


|I -t- WiR| > |I + W 2 RI if Wi > W 2 
(see e.g. [39]), so that 


Fig. 1. Secrecy capacity for the isotropic eavesdropper and the capacity of 
the regular MIMO channel (no eavesdropper, e = 0) vs. the SNR (= Pt 
since the noise variance is unity); Ai(Wi) = 2, A 2 (Wi) = 1. Note the 
saturation effect at high SNR, where the capacity strongly depends on e but 
not the SNR, and the negligible impact of the eavesdropper at low SNR. 


SNR approximations and capacity bounds for the general (non¬ 
isotropic) case. In particular, C*{e) is a decreasing, convex 
function of e. As Fig. 1 shows, the presence of eavesdropper 
results in capacity saturation at high SNR, where the eaves¬ 
dropper’s impact is much more pronounced. 

The following proposition demonstrates the saddle-point 
property for the class of channels in (37) which will be 
important later to prove the converse result for the compound 
secrecy capacity. 

Proposition 3. Consider the class of channels in (35) for a 
given (known) Wi and any W 2 G S 2 - The following saddle- 
point property holds: 

maxminCfR, W 2 ) = minmaxC(R, W 2 ) (41) 

R W2 W2 R 

where max and min are over all admissible R, W 2 . 

Proof: For the max-min part, observe that (^(R, W 2 ) > 
(^(R, el) (which follows from the proof of Proposition 2), so 
by taking max min of both parts, one obtains 

maxminC'(R, W 2 ) > max(7(R, el). (42) 

R W2 R 

On the other hand, by using W 2 = el instead of min, one 
obtains 


C'(R,W 2 ) > C'(R,eI) 

for any R, with equality if W 2 = el. Taking min max of both 
parts results in (39). ■ 

It follows from Proposition 2 that the isotropic eavesdropper 
is the worst-case one under a bounded channel gain for any 
Wi. This is also appealing from the channel feedback perspec¬ 
tive: it is hardly possible to expect that the eavesdropper will 
share its channel with the transmitter to make eavesdropping 
harder, so only minimal information can be expected by the 
transmitter about the eavesdropper channel. 

The secrecy capacity C*{e) under the isotropic eavesdrop¬ 
per has been studied in details in [40], including its high/low 


maxminC'(R, W 2 ) < maxC'(R, el) (43) 

R ^V 2 R 

so that 

maxminC'(R, W 2 ) =maxC(R, el) 

R W 2 R 

= minmaxC'(R, W 2 ). (44) 


This proves the desired saddle-point property. ■ 


B. Compound Secrecy Capacity 

The saddle-point property above is instrumental in estab¬ 
lishing the secrecy capacity of the compound MIMO channel 
in (35) and (37) as the following theorem shows. 
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Theorem 4. Consider the compound Gaussian MIMO wiretap 
channel in (35) with known Wi and unknown W 2 belonging 
to the uncertainty set S 2 in (37). Its compound secrecy 
capacity Cc is 

Cc = maxminC'fR, W 2 ) 

R W2 

= min max C'(R,W2) 

W2 R 

= C*{e) (45) 

where max and min are over all admissible R, W 2 . The 
optimal signaling is Gaussian and on the eigenmodes of the 
legitimate channel, 

R* = UiA*U+, (46) 


where the columns of unitary matrix Ui are the eigenvectors of 
Wi, diagonal matrix A = diag{A*} collects the eigenvalues 
o/R*, 


A* = 


e + g» 

2eg* 




4eg* 


(e + g*)^ V ^ 


gi-e 


-1 -1(47) 


and X > 0 is found from the total power constraint A* = 
Pt, gi = Ai(Wi), (x)+ = maxjxjO}. The secrecy capacity 
can be expressed as 

1 + giK _ ^ I i„ 2e + (e + gi)zi 

e 


C*(e) = ^ln 


l + eA* 








2gi + (e + gi)zi 


(48) 


where the summation is over the set of active eigenmodes: 


i+ = {i : gi > X + e}. (49) 

Proof: Note first that 

Cc < minmaxC'(R, W 2 ), (50) 

2 R. 

i.e., the compound capacity cannot exceed the worst-case 
capacity in the class and the latter is achieved by Gaussian 
signaling. On the other hand, it follows from Corollary 2 that 


where (R*,el) is the saddle-point. The inequalities in (53) 
follow from (45), cf. also [33,41]. It is remarkable that this 
result holds for any Wi and hence does not require the 
channel to be degraded (unlike all known to date results). 
The saddle-point property in Proposition 3 is instrumental in 
establishing the optimality of Gaussian signaling and hence the 
compound secrecy capacity for the non-degraded case (using 
this property avoids the need to prove the converse directly - 
the most difficult part of establishing the compound secrecy 
capacity for the non-degraded case). 

The inequalities in (53) have the well-known game-theoretic 
interpretation; the transmitter sets R = R* and the adversary 
(nature or eavesdropper) sets W 2 = el; neither player can de¬ 
viate from this strategy without incurring a penalty (provided 
that the other player follows it). 

Note that the optimal signaling directions that achieve the 
compound capacity are the same as those for the regular 
MIMO channel (no eavesdropper) but the power allocation 
{A*} is somewhat different from the regular water-filling 
(WF), even though it shares many of its properties, which 
is summarized below (see [40] for further details). 

Proposition 4. Properties of the optimum power allocation: 

1) X* is an increasing function of gi (strictly increasing 
unless X* = 0 or Pt) , i.e. stronger eigenmodes get 
more power (as in the standard WF). 

2) X* is an increasing function of Pt (strictly increasing 
unless X* = 0). A* = 0 for i > 1 and A* = Pt as 
Pt 0 if gi > g^, i.e. only the strongest eigenmode is 
active at low SNR, and X* > 0 if gi > e as Pt —>■ 00 , 
i.e. all sufficiently strong eigenmodes are active at high 
SNR. 

3) A* > 0 only if gi > e, i.e. only the legitimate eigenmodes 
stronger than the eavesdropper ones can be active. 

4) A is a strictly decreasing function of Pt and 0 < A < 

— e; A —)■ 0 as Pt —>■ 00 and X ^ gi — e as Pt —> 0. 

5) There are active eigenmodes if the following in¬ 
equalities hold: 


Cc > maxminCfR, W 2 ) 

— R W2 

(51) 

= minmaxC'fR, W 2 ) 

W2 R 

(52) 


where the equality is from Proposition 3. Combining the lower 
and upper bounds, (45) and optimality of Gaussian signaling 
for the compound channel follow. The optimal covariance in 
(46)-(47) and the capacity in (48) follow from Proposition 2 
in [40] since the worst-case eavesdropper is isotropic. ■ 


Pm^ <Pt< Pm+ + l (54) 

where Pm^^ is a threshold power (to have at least m+ 
active eigenmodes): 

p _ e + gi [ L ^^gi g» — gm+ _ \ 

hi vV / 

(55) 


Note that this theorem does not require the compound 
channel to be degraded (as is the case for the known capacity 
results, where all eavesdropper channel states are required 
to be degraded with respect to all legitimate user channel 
states). It shows that the secrecy capacity of the worst-case 
channel is also the (compound) secrecy capacity of the class 
of channels (achievable by a single code on the whole class) so 
that Gaussian signaling is optimal, and the following saddle- 
point inequalities hold for any feasible R and W 2 , 

C'(R,eI) <Cc = ^(R*,^) < C'(R*,W2) (53) 


for TO+ = 2,...,Nt and Pi = 0, so that to+ is an 
increasing function of Pt. 

The two terms in (48) represent the high-SNR asymptote 
and its (negative) correction term of the secrecy capacity 
respectively, so that 

^ *+• 
i+ 

as SNR —>■ 00 . In this regime, only those eigenmodes are 
active which are stronger than the eavesdropper ((/i+ > e). 
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Fig. 2. An example of two uncertainty sets when W 2 = diag{di, ^ 2 } > 0. 
The (whole) set Sa corresponds to the uncertainty set given in (37), while 
the shaded set St corresponds to (60). 


Fig. 3. An example of two uncertainty sets and when m = 2 and 
W 2 = diag{di, ^ 2 } ^ 0. «Sa has a (unique) maximum element W 2 (dark 
dot) while St does not, but only a set of maximal elements (dark line) «S 2 m- 


Since the 2nd term is negative and increasing, it follows that 

C*{e) < X! 4 : > £: (57) 

i+ 

at any SNR. Fig. 1 illustrates this regime. 

At low SNR, only the strongest mode is active and 

C* (e) = In « (ffi - e)PT (58) 

1 + ePr 

where gi are in decreasing order, and 2 nd equality holds when 
{gi — e)PT <C 1. It follows from (55) that only one eigenmode 
is active, i.e. beamforming is optimal (which is appealing from 
practical perspective due to its low complexity), when 


to Ai(W 2 ) < e). On the other hand, setting W 2 = el 
demonstrates that the lower bound is achieved by this worst- 
case channel. Since the compound capacity does not exceed 
the worst-case one, the desired result follows. ■ 

We remark that the set S 2 is not necessarily convex or 
compact (as required by Theorem 3), nor it has some other 
“nice” properties, except that el is its dominant element, and 
that Theorem 4 is a special case. This clearly demonstrates 
the importance of the isotropic eavesdropper for compound 
MIMO wiretap channels. 


Pt ^ 


e + gi 
2egi 



4egi gi - 32 

(e + gi)^ (g2 -e) + 



To generalize these results further, we will need the follow- 

1591 

^ ing definitions. 


In particular, it is the case at any SNR if g 2 < e (provided that 
gi > e), i.e. when the eavesdropper uncertainty is sufficiently 
large. 


Definition 8. Let S 2 be an uncertainty set of W 2 . W 2 is its 
(unique) maximum element if W 2 G S 2 and VW 2 G S 2 ^ 

W2 < W*. 


C. Broader Class of Compound MIMO Channels 

The result in Theorem 4 can be further extended to a 
broader class of compound MIMO channels. To this end, let us 
generalize the uncertainty set S 2 for the eavesdropper channel 
as follows 

AV 2 G S 2 —y AV 2 ^ t:I G *S 2 , (60) 

i.e., all its members are less than or equal to el. Unlike (37), 
it may include not all such W 2 ; it is not required to be 
convex, compact etc. Fig. 2 illustrates the difference between 
the uncertainty sets defined in (37) and (60) for diagonal W 2 . 

Proposition 5. Consider the compound Gaussian MIMO wire¬ 
tap channel in (35) when Wi is known and unknown W 2 
belongs to the uncertainty set S 2 in (60). Its compound secrecy 
capacity is Cc = C*(e), i.e., as in Theorem 4. 

Proof: Observe that the compound secrecy capacity of 
this channel is not smaller than that in Theorem 4, since 
the uncertainty set here is included in the uncertainty set of 
Theorem 4 (which includes a// W 2 < el, since it is equivalent 


Definition 9. W 2 m is a maximal element of S 2 if 
W 2 ,W 2 m G S 2 ,W 2 > W 2 m ^ W 2 = W 2 ™ (i.e. the 
only element in «S 2 greater or equal to W 2 m is W 2 m itself). 

Note that Definition 9 is due to the fact that not any two 
positive semi-definite matrices can be compared (i.e. it can be 
that neither Wi > W 2 nor Wi < W 2 is true, unlike the 
scalar case), so that a maximum element may not exist. While 
maximum element, if it exists, is unique, there may be many 
maximal elements in a set (see e.g. [41] for more details). 
Fig. 3 illustrates these definitions for the case of diagonal W 2 
and m = 2. 

We are now in a position to generalize Proposition 5. 

Proposition 6. Consider the compound Gaussian MIMO wire¬ 
tap channel in (35) when Wi is known and unknown W 2 
belongs to an uncertainty set S 2 , whose maximum element is 
W 2 . The saddle-point property holds, so that the compound 
secrecy capacity Cc equals to the worst-case secrecy capacity 
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Oiu- 

Cc = max min C(R,W 2 ) 

R WseSa 

= min maxCfR, W 2 ) = Cm 
WaeSa R 

= maxC(R,W2) (61) 

R, 

where the worst-case channel is W 2 , and the transmission on 
this channel is optimal for the whole class of channels in <52- 

Proof: Observe that 

C(R, W 2 ) > C(R, W;) VR, W 2 e 52 (62) 

which is due to the fact that |I + WR| is monotonically 
increasing in W [39] for any (positive semi-definite) R, so 
that, by using max min and min max on both sides, one 
obtains 

max min C(R,W2) = max c(R,w;) 

R WaCSa R 

= min maxC(R,W 2 ) = Cm- 

WaeSa R 

(63) 

To prove the operational meaning of the max min part, observe 
that Corollary 2 does not apply directly as S 2 is not neces¬ 
sarily compact. Instead, consider another compact set S '2 that 
includes all positive semi-definite W 2 such that W 2 < W 2 . 
Clearly, this set is closed and bounded and hence compact, and 
S 2 Q S' 2 , so that its compound capacity C' satisfies C' < Cc- 
Applying Corollary 2 to S' 2 , one obtains 

Cm = min maxC(R,W 2 ) 

WaCSa R 

= min max C(R, W 2 ) 

R 

= max min C(R,W2) 

R 

< C' < C, < Cm 

where the 2nd equality is due to the fact that (62) holds for 
S '2 as well so that Cm is the same for 52 and 5^ (since both 
sets have the same maximum element WJ); the 3rd equality 
is due to the fact that (63) holds for 5^ as well. This proves 
C' = Cc = Cm and hence the desired result. ■ 

This proposition says, in effect, that the saddle-point prop¬ 
erty holds and, thus, the compound secrecy capacity equals 
the worst-case one, if a maximum element of the uncertainty 
set S 2 exists^ and the rest of its structure is irrelevant. 

When the uncertainty set does not have a maximum ele¬ 
ment, its compound and worst-case secrecy capacities can be 
characterized using minimal elements as follows. 

Proposition 7. Consider the compound Gaussian MIMO 
channel in (35) when Wi is known and unknown W 2 belongs 
to a bounded and closed uncertainty set S 2 , which does not 
have a maximum element. Then, 

min C(R,W2) = min C(R,W2)VR (64) 

^Recall that it is not the case in general and many sets of positive semi- 
definite matrices do not have a maximum element, as Fig. 3 shows. 


where 52m A the set of all maximal elements W 2 m of S 2 , 
and hence 

Cm = min maxC(R,W 2 )= min maxC(R,W 2 ) 
WaeSa R WaeSa™ R 

(65) 

Cc > max min C(R, W 2 )=max min C(R, W 2 ) 

R WaeSa R WseSa™ 

( 66 ) 

i.e. minimizing over the whole uncertainty set S 2 is equivalent 
to minimizing over (normally much smaller) set of its maximal 
elements. 

Proof: Since the proof is highly technical, it is relegated 
to Appendix C. ■ 

We remark that Proposition 7 effectively reduces the dimen¬ 
sionality of the related optimization problem: if the original 
problem in (65) is C-dimensional, the reduced one (on the 
right hand side) is at most [D — 1)-dimensional, since 52m is 
on the boundary of 52 (this can be proved by contradiction). 
In some cases, this proposition can be applied even if 52 is not 
compact by enclosing it in a bigger compact set S '2 provided 
that the minimum in (64) is the same for both sets. 

The last two propositions demonstrate the key role of the 
maximum element in the uncertainty set: if it exists, a saddle- 
point exists, so it is a sufficient condition. It can be shown, 
via examples, that the absence of a maximum element may or 
may not result in the absence of a saddle-point, so there is no 
necessary condition here. 

D. Rank-Constrained Eavesdropper 

In this section, we consider the case where there is an extra 
constraint on the rank r(W 2 ) of the eavesdropper channel 
W 2 , r(W 2 ) < r 2 for given r 2 < Nt. This constraint is 
motivated by the fact that r(W 2 ) < N 2 so that when the 
number N 2 of eavesdropper antennas is small, N 2 < Nt, full- 
rank W 2 is not possible so that the results in Theorem 4 may 
be too conservative^. This applies in particular to a massive 
MIMO case, where the transmitter is a base station with a large 
number of antennas and the receiver/eavesdropper are handsets 
with a small number of antennas (due to the size/complexity 
constraints), so that Nt ^ Ni,N 2 . 

The eavesdropper uncertainty set is of the form 

52a = {W 2 : IW 2 I 2 < e, r(W 2 ) < r 2 } (67) 

where the 1st inequality reflects the fact that the eavesdropper 
channel gain is bounded (due to e.g. minimum propagation 
path loss) and the 2nd one reflects the fact that the rank is 
bounded due to e.g. small number of eavesdropper antennas. 
The compound secrecy capacity can now be characterized as 
follows. 

Theorem 5. Consider the compound Gaussian MIMO wiretap 
channel in (35) with known Wi and unknown W 2 belonging 
to the uncertainty set S 2 a in (67); assume that the rank of the 

■^This problem formulation was suggested by A. Khisti. 
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legitimate channel satisfies r(Wi) = ri < r 2 - The compound 
secrecy capacity Cc of this channel is as follows: 

Cc = maxminC'(R,W 2 ) 

= min max C'(R,W2) 

W 2 R 

= ( 68 ) 

i=l * 

where max and min are over all admissible R, W 2 .' R, W 2 > 
0, trR < Pt, W 2 € <S 2 a- The optimal signaling is Gaus¬ 
sian and on the eigenmodes of the legitimate channel as in 
(46), and A* is as in (47). The worst-case eavesdropper is 
WJ = eUiaU^^, where the columns of semi-unitary matrix 
Uio are the eigenvectors of Wi corresponding to strictly 
positive eigenvalues. 

Proof: First, observe that {(Ti(HR5)} is weakly ma¬ 
jorized by {(Ji(H)CTi (R 2 )} (see e.g. [42]), i.e. 

k k 

^a,(HR5) < ^a,(H)a,(R5), 1 < fc < iV^ (69) 
where all singular values ai are in decreasing order. Therefore, 

ln|I + W 2 R| =^ln(l + af(H2Ri/2)) 

<f]ln(l + a2(H2)a2(Ri/2)) 

r2 

< ^ln(l + eA,(R)) (70) 

i=l 

where we have used the fact that cr|(R5) = Ai(R), cr|(H) = 
Ai(W). The 1st inequality is due to [42, Theorem 3.3.14] 
and the fact that ln(l -f e“) is convex in x and ln(l -f x"^) 
is continuous, and the 2nd inequality is due to Ai(W 2 ) < 
IW 2 I 2 < e. Similarly, we have 

ri 

In |I + WiR| < y] ln(l + A,(Wi)A,(R)). (71) 

i=l 

Using these two upper bounds and observing that the 2nd one 
is achieved by using R with the same eigenvectors as those 
of Wi and such choice of eigenvectors does not affect the 
bound in (70), one obtains, using Theorem 3: 


Cr > maxminC'(R, W2) 

R W2 ^ 


> max{ln |I + WiR| - ^ ln(l + eAi(R))} 


max 

Ai: Ai>0,^. Xi<PT 


2=1 


Vln l + A.(Wi)A. 

^ 1 + eAi V 3 V y 

2 = 1 


where the sum is limited to ri due to ri < r 2 so that, from 
Corollary 1 in [13], ^(R*) < ri. On the other hand, since the 


worst-case capacity is not less than the compound one. 


C*{e) 

(73) 

where the 2nd equality is due to the fact that when Wi 
and W 2 have the same eigenvectors, signaling on those 
eigenvectors is optimal (see [43, Proposition 1]). This proves 
the saddle-point and thus establishes the capacity Cc = Cyj = 
C*{e). The optimal signaling follows from (70) and (71) 
where the equalities are attained by W 2 = eUiaU]*]j and 
R = UiaAUj'jj, which also attains the equalities in (72) and 
(73). ■ 

Remark 11. Note that the worst-case eavesdropper W 2 = 
eXJiaJJi^ is ’’isotropic” on the sub-space spanned by the 
columns of Uia (but not on the whole space), which is known 
as “omni-directional” in the antenna literature [44] (i.e. having 
the same gain in all directions of that sub-space). Comparing 
Theorems 4 and 5, one concludes that the eavesdropper rank 
constraint has no effect on the capacity and optimal signaling 
provided that ri < r 2 holds. 

Remark 12. Unlike the rank-unconstrained case, there is no 
dominant channel in the rank-constrained uncertainty set, i.e. 
W 2 < W 2 does not hold for all W 2 € S 2 a, so that the 
uncertainty set is not ’’degraded” (with respect to W 2 or any 
other W 2 ). Since the set S 2 a is not convex either, one cannot 
use Von Neumann mini-max Theorem (or its extensions) to 
infer an existence of saddle-point, which is established in 
(68) via the singular value inequalities, so that the following 
inequalities hold for any feasible R and W 2 , 

C(R, W2) <Cc = C^= C'(R*,W*) < ^(R*, W 2 ) (74) 

where (R*,W 2 ) is the saddle-point. It can be demonstrated 
(via examples) that the saddle-point property does not hold if 

ri > r2. 

Remark 13. The condition on the ranks ri < r 2 is insured if 
Ni < N 2 and both channels are of full raw ranks. In particular, 
this holds if iVi = A 2 = 1. 

VI. Double-Sided Channel Uncertainty 

Here we consider the case where both the legitimate and 
eavesdropper channels are uncertain. The compound channel 
model follows the model in (35) where: 

= {Hi : Hi = Ho + AH, |AH |2 < d} (75a) 
52 = {W 2 : IW 2 I 2 < e} (75b) 

where Hq is the nominal part of Hi known to the transmitter, 
and AH is the uncertain, unknown part; |AH |2 = cri(AH) 
is the spectral norm of AH, i.e. the largest singular value 
cti(AH). The uncertainty of W 2 follows the same model as 
in (37). This compound model reflects two important points: 


Cc<Ccu= minmaxC(R, W2) 

W 2 R 

< maxC'(R,W 2 ) 

R, 


max 

Ail Ai>0,^. Xi<PT 


ri 

E 

2=1 


In 


l + A,(AVi)A, 

1 -|- eAi 
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1) The desire of the eavesdropper to be confidential to 
keep its spying abilities uncompromised, so it does not 
share its channel with the transmitter and therefore only 
minimal information about H 2 is available to the latter. 

2) The legitimate receiver, on the other hand, wishes to 
maximize the rate so it shares its channel with the 
transmitter. Its channel uncertainty is due to the limi¬ 
tations of the feedback and estimation procedure, which 
is normally much smaller than that of the eavesdropper 
(and hence the known nominal part). 

The secrecy capacity of this compound channel can be 
characterized as follows. For this purpose we define 


C'(R,Wi,W 2 ) = In 


|I + WiR| 

II + W 2 RI 


which depends on the transmit covariance matrix R and the 
unknown channels Wi = Hj^Hi and W 2 = HJH 2 to the 
legitimate receiver and the eavesdropper respectively. 


Theorem 6. Consider the compound Gaussian MIMO wiretap 
channel in (35) when Wi and W 2 are unknown and belong 
to the uncertainty sets Si and S 2 in (75). Then, the compound 
secrecy capacity Cc is 

Cc = max min C(Wi,W 2 ,R) 

R Wi,W 2 

= min maxC'(Wi,W 2 ,R) = 

Wi,W 2 R 

= C'(Wi„,W2^,R*), (76) 


i.e., the worst-case secrecy capacity C^j is also the (compound) 
secrecy capacity C'c of the class of channels and Gaussian 
signaling is optimal. The saddle-point property holds, 

C'(Wi„,W2„,R) <C, = W2^,R*) 

<C'(Wi,W2,R*), (77) 

where (Wi„,W2™,R*) is the saddle-point. The worst-case 
channel is 


Wi„ = Hi„ = Vo(So - eiI)+U+, 

W 2 „ = el, (78) 


where Uq, Vq are unitary matrices of right and left singular 
vectors of the nominal channel Hq and Sq Is the diagonal 
matrix of its singular values. The optimal covariance R* is 
as in Theorem 4 with the substitution 

i/,^(a,(Ho)-ei)^, Ui^Uo, (79) 

i.e., the optimal signaling is on the eigenmodes of the degraded 
nominal channel Hi^, and isotropic eavesdropper. 

Proof: The proof can be found in Appendix D. ■ 

Note that this theorem does not require the compound 
channel to be degraded. Remarkably, the saddle-point property 
still holds and the isotropic eavesdropper (of the maximum 
gain) is still the worst-case one, even under the legitimate 
channel uncertainty, and the optimal signaling is almost the 
same as in Theorem 4 (Gaussian signaling is still optimal), 
with the legitimate channel substituted by its degraded (due 
to uncertainty) version. We observe that, as the uncertainty 


(i.e. ei and/or e) increases, fewer and fewer eigenmodes are 
used until only the strongest one remains active, in which case 
the beamforming is optimal (see (59)). From this perspective, 
beamforming is the most robust strategy. 

The game-theoretic interpretation of the inequalities in (77) 
is the same as for the single-sided uncertainty: el, R*} 

is a saddle-point in the matrix game between the transmitter on 
one side and the eavesdropper and nature on the other; neither 
can deviate from the optimal strategy without incurring penalty 
provided that the other player follows the strategy. 

A. Rank-Constrained Eavesdropper 

Using similar arguments. Theorem 6 can be extended to the 
rank-constrained eavesdropper channel, 

51 = {Hi : Hi = Ho + AH, |AH |2 < ei} (80a) 

52 = {H2:|H2|2<e, r(H2)<r2} (80b) 

where the eavesdropper rank is constrained by r 2 (due to e.g. 
limited number of antennas). 

Theorem 7. Consider the compound Gaussian MIMO wiretap 
channel in (35) when Hi and H 2 are unknown and belong to 
the uncertainty sets Si and S 2 in (80). Assume that r(Ho) = 
fi A 'f' 2 - Then, the compound secrecy capacity Cc is as in 
(76); the saddle-point property in (77) holds and the worst- 
case channel Wi^, is as in (78) while W 2 U, is 

W 2 „ = e"UoaU+, H 2 ^ = VS 2 „U+, (81) 

where V is an arbitrary unitary matrix, semi-unitary matrix 
Uoa collects the columns of Vq corresponding to strictly 
positive singular values, and 

S 2 U, = d/apje, ..,e, 0, ..,0} (82) 

is a diagonal matrix with the 1st ri diagonal entries being 
e and 0 otherwise. The optimal covariance R* is as in 
Theorem 6, i.e. the optimal signalling is Gaussian and on the 
eigenmodes of the worst-case legitimate channel Hi^,. 

Proof: The proof can be found in Appendix E. ■ 


VII. Weak vs. Strong Secrecy 


The results above have been established under the strong 
secrecy condition. It was demonstrated in [27,28] that, for 
regular (single-state or known) channels, strong and weak 
secrecy capacities are the same. That result, however, does not 
immediately apply to the compound setting here. Nevertheless, 
it can be shown that the weak 67“®“^ and strong 
compound secrecy capacities are the same. 


^weak _ ^strong 


(83) 


if the saddle-point property holds under strong secrecy, i.e. 
Cw = 67®*’’°"®. Indeed, under the saddle point property, 

67„ = 67f’’°"® < 6/“®“'= < (84) 

from which (83) follows, where we have used the fact that the 
worst-case capacity is the same under the strong and weak 
secrecies, and that the strong compound secrecy capacity is 
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not larger than the weak one. In particular, the results in 
Theorems 4, 5 and Proposition 5 also hold under weak secrecy, 
SO that one can go from weak to strong secrecy for free in the 
compound settings as well under the saddle-point property. 

In fact, the chain argument in (84) has the following 
implications: 


• the saddle point under strong secrecy (Cw = im¬ 

plies a saddle point under weak secrecy (C^, = 

• no saddle point under weak secrecy (Cw > (7“®“^) 
implies no saddle point under strong secrecy (C^, > 


^strong'j 


VIII. Conclusion 

The secrecy capacity of compound wiretap channels has 
been studied. First, the achievable strong secrecy rate of 
finite-state compound channels under finite alphabets in [18] 
was extended to arbitrary uncertainty sets (not necessarily 
countable or finite-state) and then to continuous input/output 
alphabets and arbitrary compact uncertainty sets. Based on 
this, the (strong) secrecy capacity of the compound Gaussian 
MIMO wiretap channel has been established under the spectral 
norm constraint on the eavesdropper channel. The channel is 
not required to be degraded. The optimal signaling as well as 
the secrecy capacity are given in a closed form. The saddle- 
point property has been shown to hold, so that the compound 
capacity equals to the worst-case one and signaling on the 
worst-case channel achieves the compound capacity. Isotropic 
eavesdropper is the worst-case one and signaling on the 
eigenmodes of the legitimate channel is optimal. The results 
are extended to non-isotropic uncertainty sets. It is shown that 
the existence of a maximum element in the uncertainty set is 
sufficient for a saddle-point to exist, so that compound capacity 
equals to the worst-case one and signaling on the worst-case 
channel achieves the capacity of the whole class of channels. 
Finally, these results are extended to include the legitimate 
channel uncertainty. 

While the results above have been established under the 
total power constraint frR < Pt, using similar reasoning 
it can be shown that the same result holds under a general 
power constraint of the form R G 5 r, where is a unitary 
invariant set of positive semi-definite matrices, i.e. R G 5 r 
implies URU+ G <Sr for any unitary U. This constraint 
limits possible eigenvalues of R but does not constrain in 
any way its eigenvectors. Special cases include the total and 
maximum per-eigenmode power constraints (either alone or in 
combination with each other). 

Appendix 

A. Proof of Lemma 1 

It is known that the secrecy capacity of a wiretap channel 
depends only on its marginal channels and not on its joint 
probability distribution^, cf. for instance from [3, Lemma 2.1]. 
Therefore, it suffices to find good approximations {Ws,Vs) 


for the marginals {Ws,Vs) only, which simplifies the task 
significantly. To this end, using [15, Lemma 4] for both 
marginal channels, one obtains approximations that satisfy 


\wMx)-wMx)\ < \y\lL< \y\\z\/L, 
\Vs{z\x)-VMx)\ < \Z\/L<\y\\Z\/L, 

2\y\^ 2\y\^\z\^ 

Ws{y\x) <2 r Ws{y\x) <2 Ws{y\x), 

2\Z\^ 2iyl^lZI^ 

VMx) < 2^Vs{z\x) < 2^^T^Vs{z\x) 


for all a: G -T, y G y, and z € Z, and further for any input 
distribution Px &V{X) 

\IiX;Y,) - I{X;Ys)\ <2\y\yyL^/^ <2i\y\\Z\)yyL^/f 
\IiX-,Zs) - I{X;Zs)\ < 2|Z|3/7l1/2 < 2i\y\\Z\)y^/ 


Note that in the first step, the application of [15, Lemma 4] 
yields bounds, where the constants are different and depend on 
their own alphabet size, i.e., either on |[y| or on \Z\, which is 
difficult to use in the following analysis. The 2nd step results 
in the bounds with the same constant, which facilitates the 
further analysis. ■ 


B. Proof of Lemma 2 

The first property (13) follows by observing that for all 
x” G -T” and y” G 3^" we have 


W^{y-\xn = l[Wsiy,\x,) 

i=l 


< 2 ” 


L 


Ww s{yi\xi) 
2 = 1 


= 2^ 


, li'l- 




which naturally extends to decoding sets as Wf'{'D'f\x'^) < 

I y I ^ I .ZI ^ n 

2” E FLg and likewise for the error probability. 

The more interesting part is the robustness of the secrecy 
constraint. Following the classical approach in [15, Lemma 4] 
would lead to a bound which is too loose to prove what we 
aim for, cf. also Remark 4. Therefore, we make use of a recent 
result in [29, Lemma 2]. 


Lemma 5. Let X and y be finite alphabets and W,W : <T —> 
Viy) be arbitrary channels with 


max \W{y\x) — W{y\x)\ < e 
y^y 


(85) 


for some e > 0. For arbitrary n G N, let Id be an arbitrary 
finite set, Pjj G Pifi) the uniform distribution on Id, and 
E{x'^\u), x^ G X'^ an arbitrary stochastic encoder, cf (1). 
We consider the probability distributions 


PuYr^{u,yn= E W-{y-\xnE{x-\u)Pu{u) 
PuYAu,yn= E 'i^{yl^nEix-\u)Pu{u). 


^In particular, two wiretap channels with different joint probability distri¬ 
butions will have the same secrecy capacity if they share the same marginal 
channel probabilities. 


Then it holds that 

|/(C/;r"||P)-/(C/;r"||P)| <4n(elog|3^|+if2(e)) (86) 
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where I{U]Y'^\\P) means that the mutual information is 
evaluated under the joint probability distribution P. 


were in 82 ^), W 21 f WJ, and ^(R.Wai) < C'(R,W^). 
If W 21 € <S 2 m, we have a contradiction: 


Proof: The proof is based on the technique developed for 
quantum channels in [30] and can be found in Appendix B of 
[29], ■ 

Note that this lemma must be applied carefully: In the 
problem at hand, the channels I 4 and Vs satisfy |14(^|a:) — 
Vs{z\x)\ < |[y||Z|/L for all X e -T and z € Z, cf. (11a)- 
(11b). Thus, (85) is satisfied with e = \y\\Zf jL which then 
yields the desired result, i.e., 

|/(M; Zf)-I{M- X)\ < H\y\\^\" log \Z\/L+H2{\y\\Z\^ 

(87) 

This completes the proof. ■ 

C. Proof of Proposition 7 

The following lemma is instrumental. 

Lemma 6. Let Wi,W 2 ,... be a bounded and increasing 
sequence of positive semi-definite matrices, i.e. 


C(R,W2i) < C(R,W*) 


C'(R,W2) 

<C(R,W2i). 


< mm 


(92) 


0 < Wi < W 2 < .. < Wi < ... < al 


( 88 ) 


where 0 < a < 00 is a positive constant. This sequence 
converges. 

Proof: Consider the following sequence of (non-negative) 
scalars ai = x+W^x, where x is a vector of appropriate size; 
for convenience, we take |x| = 1. Since {W^} is an increasing 
and bounded sequence, so is {cti}. 


Assume further that W 21 f S 2 m so that there exists such 
W 22 G S 2 that W 22 > W 21 , W 22 f W 21 , and the 
process is repeated. In this way, we construct a non-decreasing, 
bounded sequence {W 2 , W 21 ,..., W 2 i,...}, which either ter- 
/Ti))inates in a finite number of steps (when some W 2 fc € S 2 m 
SO we cannot find a greater one) or it continues indefinitely. In 
the first case, we have a contradiction and thus the assertion 
is proved. 

In the second case, we claim that the sequence will converge 
to some W e <S 2 m- To see this, first observe that this sequence 
will converge to some W € <S 2 (due to Lemma 6, since ^2 
is bounded and closed and thus compact and the sequence is 
increasing and bounded; the boundedness can be understood 
in any norm, since all matrix norms are equivalent). Thus, we 
have to prove that W G ^2171- To see this, first observe that 
W > W 2 i Vt (since the sequence is increasing). If W ^ S 2 m, 
then there exists W* G ^2 such that W* > W > W 21 so 
it can be taken as a part of the constructed sequence and thus 
W cannot be its limit - a contradiction. Therefore, W G S 2 m, 
as claimed. This, however, results in a contradiction to (91) 
so that (64) holds. To see (65), take max^ in (90)-(92) and 
apply the same argument. ■ 


0 < «! < 0:2 < < Qfi < ... < a 


(89) 


and therefore it converges to some non-negative number 
b(x) = limj_).oo CKi < a. Hence, for any e > 0, there is 
such n(e,x) that b(x) — ai < e Vi > n(e,x),x. Since this 
is true for any x, take n(e) = max^ n(e, x) and observe that 
|6(x) — ai\ < e Vi > n(e) and all x. It follows that {ai} is 
a Cauchy sequence, i.e. \aj — ai\ < e Vi, j > n{e) and all x, 
i.e. 

x+{Wj -W,)x < e Vx 

from which it follows that Ai (Wj — Wi) < e and thus |j Wj — 
Will —> 0 in any norm (since all norms are equivalent [39]), 
i.e. {Wi} is a Cauchy sequence and thus converges [45,46], 
W, ^ W < al. Taking Frobenius norm, one obtains element¬ 
wise convergence of this matrix sequence. ■ 

Note that this result generalizes to matrices the well- 
known fact that any scalar increasing and bounded sequence 
converges. 

To proceed further, observe from the definition of S 2 m that 

min C'(R,W 2 )< min C'(R,W 2 ). (90) 

We prove the equality by contradiction. Assume that 

min C'(R,W2) < min C'(R,W2) (91) 

and let WJ = arg minw 2 eS 2 W 2 ) be a minimizer over 
^ 2 . Then, W 2 f S 2 m (due to the strict inequality) so that 
there exists W 21 G S 2 such that W 21 > W 2 (otherwise WJ 


D. Proof of Theorem 6 
First, we observe that 

C'(Wi,W 2 ,R) > C'(Wi,eI,R) VR,Wi, (93) 

since W 2 < el (which follows from IW 2 I 2 < e) and |I-|-WR| 
is monotonically increasing in W for any (positive semi- 
definite) R. The lower bound is achieved by W 2 = el. 
Therefore, 

minC'(Wi,W 2 ,R) = C'(Wi,eI,R) VR,Wi, (94) 

W 2 

and also 


= min max C ( W 1 , el, R) 

Wi R 

. |I + WiR| 

= mm max in —-— 

Wi R I + eA 


= mm max 
Wi R 


^In 


l + A,(Wi)A,(R) 
1 -f eAi(R) 


('>) . 
= max > In 
{A.} V 


l + (iT,(Ho)-ei)^A, 


1 -f e\i 

= C(Wi„,eI,R*) 

where (a) follows from the inequality 

|I + WiR| < [](1 + A2 (Wi)A,(R)) 


(95) 

(96) 
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which follows from [42, Theorem 3.3.14(c)] with f{x) = 
ln(l + x), where Ai(Wi), Ai(R) are ordered likewise and 
the equality is achieved when Wi, R have the same eigen¬ 
vectors; (b) follows from the inequality CTi(Hi) > ((Ti(Ho) — 
cti(AH))+ (see e.g. [39,42]) and Ai(Wi) = (jf(Hi) where 
the equality is achieved by 

We further observe that the saddle-point property in (76) is 
equivalent to (see e.g. [33]) 

C(Wi„, el, R) < C(Wi„, el, R*)<C(Wi, W 2 , R*) (97) 

and we prove these inequalities below thus establishing (76). 

Note that (a) follows from (95) (since R* is the optimal 
covariance for Wi = Wiu,,W 2 = el). To prove (b), we 
need the following technical lemma, which is an extension of 
well-known singular value inequalities for a sum and a product 
of two matrices (see e.g. [39,42]): 

Lemma 7. Let A, B and C be n x m and m x m matrices, 
and let the right singular vectors of A be the same as the left 
singular vectors of C so that their singular value decompo¬ 
sitions (SVD) are A = USoV+ and C = VScW”*', where 
U,V,W are unitary and Sa = diag{(Tai}, Sc = diagjCTci} 
are “diagonal” matrices of singular values of A and C. 
Assume that {aai} and {ad} are in decreasing order. Then, 

a,((A + B)C) > (a,(A) - ai(B))+a,(C) (98) 

where cri((A -f B)C) are also in decreasing order. The 
equality is achieved by ^ where S;, = 

diag{min(CTi(A),e)}. 

Proof: The proof is based on the variational characteri¬ 
zation of singular values, see [47] for details. ■ 

Using this lemma, one obtains: 

=C(Wi„,eI,R*) 

-ttta;- 

“ ^ 1 + eX* 

I ^ 

= C(Wi,eI,R*) 

<C(Wi,W2,R*) (99) 

where (a) follows from (95), (b) follows from Lemma 7 
applied to A = Hg.B = AH, C = R*^/^ (and observing, 
from (47), that the singular values of Hg and R*^/^ are 
ordered likewise), where we have used Ai(R) = 
and (c) follows from (93). This establishes (97) and thus (76). 


E. Proof of Theorem 7 

Using the argument similar to that in (70), it follows that 

T2 

ln|I + W 2 R*| =^ln(l + A,(W 2 R*)) 

ri 

<;^ln(l + A,(W2)A,(R*)) 

<^ln(l + e2A^(R*)) 

= ln|I + W2^R*| (100) 

and 

ri 

ln|I-f WiR*| =^ln(H-A*(WiR*)) 

^=^f^ln(l + a2((Ho + AH)R*5)) 

(^) 

> ln(l + (a,(Ho) - ei)^A,(R*)) 

i=l 

= ln|I + Wi„R*| (101) 

for any Wi G Si and W 2 G S 2 , where (a) follows 
from Ai(R) = crj^(R) and (b) follows from the singular 
value inequalities in Lemma 7. Combining these two chain 
inequalities, one obtains 

C(Wi„,W2™,R*)<C(Wi,W2,R*) (102) 

which establishes the 2nd inequality in (77). The 1st inequality 
follows from the fact that R* is the optimal covariance under 
Wi = Wiuj and W 2 = 'W 2 W Since the saddle-point 
inequalities in (77) are equivalent to max min = min max 
in (76) (see e.g. [33]), this also establishes the latter claim. ■ 
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